Humans are extremely good at recognizing speech in noisy environments and with various alterations of parameters like pitch, frequency, dialect and so on. However, speech recognition and phonetic classification  are extremely difficult tasks for a computer. For instance, one of the currently most accurate models for speech features is the mel-frequency cepstral coefficients (MFCCs), which was not able to achieve human performance in various speech-related tasks. In addition, multiple models have been developed for speech recognition and classification that achieve high performance, but they are definitely not perfect and they are highly affected by noise and phonetic parameters modification. In this paper, we only deal with phoneme classification. We compare and combine classifiers in an effort to achieve as small error rates as possible.
